Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW

نویسندگان

  • Teng Ma
  • Aurelien Bouteiller
  • George Bosilca
  • Jack J. Dongarra
چکیده

Collective communication is one of the most powerful message passing concepts, enabling parallel applications to express complex communication patterns while allowing the underlying MPI to provide efficient implementations to minimize the cost of the data movements. However, with the increase in the heterogeneity inside the nodes, more specifically the memory hierarchies, harnessing the maximum compute capabilities becomes increasingly difficult. This paper investigates the impact of kernel-assisted MPI communication, over two scientific applications: 1) Car-Parrinello molecular dynamics(CPMD), a chemical molecular dynamics application, and 2) FFTW, a Discrete Fourier Transform (DFT). By focusing on the usage of Message Passing Interface (MPI), we found the communication characteristics and patterns of each application. Our experiments indicate that the quality of the collective communication implementation on a specific machine plays a critical role on the overall application performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fourier Transforms for the BlueGene/L Communication Network

A computational kernel of particular importance for many scientific applications is the Fast Fourier Transform (FFT) of multi-dimensional data. A fundamental challenge is the design and implementation of such parallel numerical algorithms to utilise efficiently thousands of nodes. The BlueGene/L is a massively parallel high performance computer organised as a three-dimensional torus of compute ...

متن کامل

Optimizing Collective Communication in OpenSHMEM

Message Passing Interface (MPI) has been the de-facto programming model for scientific parallel applications. However, data driven applications with irregular communication patterns are harder to implement using MPI. The Partitioned Global Address Space (PGAS) programming models present an alternative approach to improve programmability. OpenSHMEM is a library-based implementation of the PGAS m...

متن کامل

3D FFT with 2D decomposition

Many scientific applications including molecular dynamics (MD) require a fast fourier transform (FFT). As the number of processors for high performance computer increases this transform has to be parallelized to larger number of processors to remove it as a bottleneck for the parallelization. This requires the decomposition to be changed from 1D to 2D. Such a 2D decomposed 3D FFT was implemente...

متن کامل

Performance Measurements of the 3D FFT on the Blue Gene/L Supercomputer

This paper presents performance characteristics of a communicationsintensive kernel, the complex data 3D FFT, running on the Blue Gene/L architecture. Two implementations of the volumetric FFT algorithm were characterized, one built on the MPI library using an optimized collective all-to-all operation [2] and another built on a low-level System Programming Interface (SPI) of the Blue Gene/L Adv...

متن کامل

Design and Performance Evaluation of LiMIC (Linux Kernel Module for MPI Intra-node Communication) on InfiniBand Cluster

High performance intra-node communication support for MPI applications is critical for achieving the best performance out of clusters of SMP workstations. Although the performance of system area networks has improved in the recent years, intra-node communication still remains orders of magnitude faster than the network. Present day MPI stacks cannot make use of operating system kernel support f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011